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Abstract. We prove that uniqueness of the stationary chain compatible with an attractive 
regular probability kernel is equivalent to the following two assertions for this chain: (1) it is a 
finitary coding of an i.i.d. process with discrete state space, (2) the concentration of measure 
holds at exponential rate. We show in particular that if a stationary chain is uniquely defined 
by a kernel which is continuous and attractive, then this chain can be sampled using a coupling- 
from-the-past algorithm. For the original Bramson-Kalikow model we further prove that there 
exists a unique compatible chain if and only if the chain is a finitary coding of a finite alphabet 
i.i.d. process. Finally, we obtain some partial results on conditions for phase transition for 
general chains of infinite order. 



1. Introduction 



In this work we consider chains of infinite order on finite alphabet, which are famihes of processes 
specified by kernels of transition probabilities that can depend on the whole past. This general class 
of processes includes the finite order Markov chains as special cases, but also includes stochastic 
models that exhibit phase transitions. An important question in the area is "what properties 
distinguish kernels exhibiting phase transition from kernel satisfying uniqueness?". This work 
gives necessary and sufficient conditions for the existence of phase transition for an important class 
of chains of infinite order, namely for the attractive regular chains. 

A probability kernel is called regular when it is continuous with respect to the past and has 
no null transition probabilities. The regularity of the kernel guarantees the existence of at least 
one chain compatible with the kernel. Attractiveness means that their exists some monotonicity 
property for the transition probabilities and it is analogous to the attractiveness of specification 



considered in the statistical mechanics (Preston 1976). 



It came with some surprise when Bramson & Kalikow ( 1993 1 showed an example of family of 



regular and attractive kernels with more than one compatible chain. Some interesting results are 



based on this Bramson-Kalikow (BK) example. For instance, Quas (1996) used the BK example 
to construct a expanding map of the circle which preserves Lebesgue measure such that the 



system is ergodic, but not weak-mixing. Also using the BK example, Stenflo (2001 ) showed a coun 



terexample to a conjecture raised by Karlin (1953). Lacroix (2000) obtained some simplification 



to the Bramson & Kalikow (1993) proof of phase transition and Hulse (2006) showed a different 



example of regular and attractive kernel exhibiting phase transition using similar ideas of proofs as 
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Lacroix (2000). To best of our knowledge, Berger et al. (2005) exhibited the only non-attractive 



example of non-unique chain of infinite order and is not based on BK example. Despite the im- 
portance of these works, none of them give general sufficient conditions for the existence of phase 
transition, even for the special (but important) case of attractive kernels. 

In the present work, we prove that, for attractive regular kernels, uniqueness of the stationary 
chain compatible with a kernel is equivalent to the following two assertions: (1) the compatible 
chain is a finitary coding of a discrete- value i.i.d. process, (2) the concentration of measure holds at 
exponential rate. The equivalence (1) means that, for attractive regular chains, uniqueness implies 
that the compatible chain is a factor of a discrete-value i.i.d. process whose mapping depends 
almost surely on a finite number of coordinates with respect to the measure of the i.i.d. process. 
Equivalence (2) means that phase transition yields loss of "good" concentration of measure. We 
show that the regularity of the kernel is essential for the three equivalences to hold and, in general, 
cannot be relaxed. We also obtain some partial results for non-regular and non-attractive cases, 
which are of independent interest. 

The main ingredient for the proof of the existence of a finitary coding is the proof that unique- 
ness of compatible chain for the regular and attractive kernels is equivalent to the existence of a 
Coupling-from-the-Past (CFTP) perfect simulation algorithm. This class of simulation algorithms 
was first introduced for the Markov chains by Propp & Wilson (1996) and then generalized to 
several other stochastic models. It is interesting that despite its simplicity, our algorithm can 
generate samples of continuous and attractive chains under regimes in which it was previously not 
known to be possible. 

Finally, from a coding point of view, it is interesting to have a finitary coding from a finite 
alphabet i.i.d. process. We show that this is possible in the original Bramson & Kalikow ( 1993 ) 
example, if and only if there is a unique compatible chain, that is, choosing the parameters of the 
model in such a way that uniqueness holds. The proof of this fact is done by explicitly constructing 
a coding function that yields a finitary coding from an i.i.d. chain with finite alphabet to an i.i.d. 
chain with countable alphabet, if the later has finite entropy. This result and the proof technique 
has an interest of its own. 

This article is organized as follows. In Section [2] we introduce the notation, definitions and 
the necessary backgrounds, with examples, to state the main results in Section [3j In Section |4] 
we discuss the results and some of their implications. In Section [5] we introduce the Attractive 
Sampler, which is used to prove the theorems of Section [3] Finally in Section [6] we prove all the 
results. 



2. Notation, standard definitions and examples 

Notation. For any set U we denote the sets of bi-infinitc, left-infinite and finite sequences of sym- 
bols ofW by z^-z ^^{...,2,1,0,-1,-2....}^ ^-N ^^{-1.-2,...} andi/* = UJ>l^^^"^• -~^^ respectively. 

The elements of these sets will be denoted, respectively, u = . . . U2UiUqu^iu^2 ■ ■ ■, u — u_iu_2 • • ■ 
and uZl- = u-iU-2 ■ • ■ U-k for any 1 < fc < +oo. 

In the present paper U is some Polish space and A is the finite ordered set {1, 2, . . . , s} unless 
specified. A is called alphabet. We define a partial order on by saying that a < b whenever 
a_,; < b^i for any i > 1. In A^^'^ , the maximal element is s, and the minimal clement is 1. 
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Chains of infinite order. A probability kernel, or simply kernel, P on alphabet A is a function 

P 



such that 



A X A-^ [0, 1] 

(a,x) I—!- P{a\x) 

Pia\x) = 1 , yxe A-N. 



We say that a stationary stochastic chain X = {Xj}j^x (of stationary law /i) on A is compatible 
with a kernel P if the later is a regular version of the conditional probabilities of the former, that 
is 

fi{Xo = a\Xzl =x) = P{a\x) 

for every a G A and //-a.e. a in A"'*. When there is more than one stationary chain compatible 
with P, we say that there is phase transition, otherwise we say that the chain is unique. We 



follow the Harris nomenclature (Harris, 1955) and call these chains chains of infinite order. These 



processes were first introduced by (Onicescu & Mihoc 1935) under the name chaines a liaisons 



completes. The existence of an invariant measure for these chains was first studied by |Doeblin| 



& Fortet (1937). It was rediscovered several times under different names (Harris 1955 Keane 



1972 Kalikow 1990). For a comprehensive historical account and recent developments we refer 



the reader to Fernandez & Maillard ( 2005 1 . 



Non-nullness, continuity rate, oscillations and attractiveness. We say that a kernel P is 
strongly non-null if 

inf P{a\x) > 0. 

aeA.xeA-'^' 

The continuity rate (or variation) of order fc of a kernel P is 

varfc := sup sup sup \P{b\aZl.x) ~ P{b\aZly)\. 

We say that P is continuous if limfc_>.oo var^ = 0. A compactness argument shows that there always 



exists at least one stationary chain compatible with a continuous kernel (see for example Keane 



(1972)). If P is strongly non-null and continuous, we say that P is a regular kernel. Another 
characterization of kernels is the oscillation rate: 

osc„ :— osc„(a) where osc„(a) := sup{|P(a|a;) — P{a\y)\ : x,y G A^^^ x^i = y-iVi ^ n}. 

a£A 

The sequences {varfe}A;>o and {oscfc}fc>o are related to the uniqueness of the compatible stationary 
chain as we will see in the examples below. 

Finally, we say that a kernel P on A is attractive if for all a € A the value of J2j>a ^Ul^) is 
increasing on a; e A^'^ . 



Let us give two important examples taken from the literature, which we will repeatedly use in 
the sequel to illustrate our assertions. 
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Binary auto-regressive models. These models are extensively used in the statistical literature 



(McCullagh & Nelder 19831, and are defined through the following parameters. A continuously 
diflerentiable increasing function ip : K -^]0, 1[, a sumniable sequence of non-negative real numbers 
{^„}„>i, and a non-negative real parameter 7 > 0. Consider the class of kernels P on alphabet 
{— 1,+1} such that 



P{a\x) := a ^ £.nX-r 



- a"f 



These kernels are attractive and regular. The continuity follows from the fact that ip is continuous. 
The strongly non-nuUness is due to the definition of ip, and the fact that J2n>i < +00 ■ It is 
also attractive because for any ordered pair x<ywe have J2n>i inX-n < X]ri>i ^nV-n, and since 
Ip is an increasing function, it follows that P{+\\x) < P(-|-l|y). 

If i/i is Lipschitz continuous, then, one directly obtains var^ < C'^^^j,^„ and osc„ < for 
some positive constant C. The criteria of Johansson & Oberg (2003) and Fernandez & Maillard 



( 2005 1 imply, respectively, that ^„ 



Cn~°' with a > 3/2 and X]n>i < ^ both imply uniqueness 
of the stationary chain compatible with P. 

An important example of such models is when ip{r) — e^'"(e^'" + e''). The resulting kernel is 
called logit model in the statistics literature, and one-sided 1-dimensional long-range Ising model 



in statistical physics literature. For instance, Hulse (2006) used this model to give an example of 
phase transition in chains of infinite order. 



The Bramson & Kalikow (1993) example. Consider an increasing function (f> : [— 1,+1] — > 
]0, 1[, an increasing sequence of positive integers {m-j}j>i and a sequence {Xj}j>i such that > 
and J2j>i = 1- We call the BK example the class of kernels defined on A = { — 1, +1} as 



P(+l|x)=^A,0(^f].x_,) 

j>l \ ^ i=l / 



The kernels of this class are attractive and regular. Attractiveness and strongly non-nullness follow 
directly from the definition of (p and simple calculations yields var„ < J2{jm >n} showing that 
P is indeed continuous as well. When (p{s) = (1 — > 0} -I- el{s < 0} for some e S (0, 1) and 

Afc = cr^^ for some r G (2/3, 1) and c = (1 — r)/r, we call this model the original BK example, as 
it is precisely the model introduced in Bramson & Kalikow (1993). They proved that, taking the 



sequence {mk\k>i increasing sufficiently fast, the kernel P exhibits phase transition. 



Maximum and minimum phases for attractive kernels. Define, for any a: G A ^, the fixed 
past chain X- by 

X- ^ I °" '^-^ (1) 

" la w.p. P{a\Xf^_^ . . . Xfx) otherwise. 

X- is a the non-stationary chain obtained by fixing the past a from time —00 to 0, and "running 

P from this past". For any x,j> 1, and n e Z, let X-'-' be the process defined by Xn'' ■— ^n+j- 

The attractiveness of P implies that for 3; = 1 and s, the sequence of processes {'X.-'^}j>i is 



stochastically non-decreasing and non-increasing, respectively (Hulse 1991), and therefore, the 
weak limits 

(2) 



X""'" := lim Xi'^ and X" 



lim X^^^ 
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exist and are stationary. If P is continuous, then X™'" and X™'*'^ are compatible with P. We call 
X™'" the minimum phase and 'X.™^^ the maximum phase. 

Finitary process. Let the shift operators Tu and Ta acting respectively on and by shifting 
the sequences of one unit: Tu{u) = and ^^(a) = {aj+ij^g^. A stationary process X 

(with stationary law /u) on alphabet A is a stationary coding of a stationary process U (with 
stationary law P) on U if there exists a measurable function $ : — which is translation 
equivariant (that is ^{Tn{U)) = Tyi$(U)) and such that ^ = Po$^i. We follow the nomenclature 
given in Shields ( 1996 ) and call B-processes the processes that are stationary coding of an i.i.d. 

NU {oo}, P-a.s. finite, 



process. We have a finitary coding if there exists a stopping time 
such that 

[<i>(U)]o = [<i>(V)]o whenever U^'.^^] ^ (3) 

This last assumption amounts to say that the event {^(U) — k} is J^(J7J^'^)-measurable, or in 
other words, that the stopping time 6 is checkable looking only at a finite number of C/^'s. We 
call finitary processes (FP) the processes that are finitary coding of an i.i.d. process. The notion 
of stationary coding comes from ergodic theory, and has a one-side analogue in the literature of 
stochastic processes, called the Coupling from the past algorithm [CFTP algorithm in the sequel). 



Such algorithms (which were first introduced in Propp & Wilson (1996) for Markov chains) aim 



to construct the function <I> using only the past values of an i.i.d. process U, and the kernel P. 
If a CFTP algorithm is possible for a given kernel P, then the stationary measure fi which is 
constructed is a FP, because it is a particular finitary coding of U, for which the event {0(U) — k} 
is J^(C/° j,)-measurable. In the literature, sometime a process is called finitary only if the set U is 
finite. We do not assume this. In the special case of W finite (or discrete), we say that the process 
is a finitary coding of a finite (discrete) alphabet i.i.d. process. 

Exponential concentration of measure. Let / : A" — > M be measurable. Define Sjf = 
sup{|/(a") — /(&i)| : Oi — bi,yi ^ j} and let 5f be the vector with j-th coordinate given by 
5jf. We say that the concentration of measure holds at exponential rate for a stationary process 
X if, for all n > and functions / such that < 7 < 00, we have 



'(|/(Xr) -E[/(Xn]| >e)< exp \-T^^ 

\ ll''.' ll£2(N) , 



with g{e, 7) > 0. 



In particular, the above inequality implies that for all k eW and h : A 

n — k 



— ^M^i:f)-E[M^i^) 



3=0 



> e < e" 



ifffe(«) 



where 5fc(e) > 0. 



[0,1] 



have 



(4) 



3. Main results 

Theorem 1. Let P be an attractive regular kernel. The following are equivalent: 
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(1) There exists an unique chain compatible with P. 

(2) X™'^'^ is a finitary coding of a discrete-value i.i.d. process. 

(3) The ergodic theorem holds at exponential rate for X™'*'^. 



From [Hulse (19911 we know that the maximum phase [resp. the minimum phase) is always B- 
process regardless of been equal or different from the minimal phase {resp. the maximum phase). 
Therefore, the fact to be a B-process does not distinguish the presence or not of phase transition. 
Theorem [l] shows that, for a regular attractive kernel P, the property to be a finitary coding of an 
i.i.d process distinguishes between existence or not of phase transition. 

Another interesting consequence of the above theorem is that for an attractive and regular kernel 
exhibiting phase transition, the maximum phase has the ergodic theorem holding at subexponential 
rate. 

We now show that Theorem [T] is optimal in the class of attractive chains, in the sense that if 
we relax either continuity or strong non-nullness, we can find examples of stationary chains that 
are FP and with ergodic theorem holding at exponential rate, although they are not uniquely 
determined by their conditional probabilities. 

Releixing the strong non-nullness assumption. The following example shows that in general 
we cannot relax the strong non-nullness condition. Before giving our example, we need some more 
definitions. For any x G , let 

i{x) :~ min{j > : a;_j_i = — 1}- 

When we look backward in x, £(x) counts the number of +1 before finding the first —1. We use the 
convention that -^(+1) = oo. Let {pi}i>o be a monotonically decreasing sequence of [0, l]-valued 
real numbers, and let = fimi-i.+oo Pi- The kernel P is defined on {—1, +1} by P{—l\x) — p^^x) 
for any x_ ^ +1 and -P( — 1| + 1) := Poo- It is clear that this example is attractive. It is also 
continuous. To see this, observe that 

sup \P[a\a-_ly)-P{a\a-_lz)\^Q 

for any aZ_\ G , except for (+1)'^. Hence, 

sup sup \P{a\a-_ly)~P{a\aZlz)\^ sup \P{a\{^lfy) ^ P{a\{+lf z)l 

and thus we obtain that 

sup sup \P{a\aZ\y)- P{a\aZ\z)\=pu-poo, 

which goes to by the definition of the sequence {pk\k>f)- If Poo ~ 0, the chain is not strongly 
non-null, and the degenerated chain with all symbols equal to -f-l is trivially stationary and com- 
patible with P. If we further assume X]fc>i 0^=0^(1 ~ Pi) < +oo, there exists another class of 
stationary chains compatible with the kernel P. It is the so-called class of renewal chains, obtained 
by concatenating i.i.d. blocks of the form (-f 1, -|-1, . . . , +1, —1) of random length. These blocks 
have length k-\-l with probability Oi=o^(I ~Pi)Pk, ^nd therefore, have finite expected length. The 
existence of several phases is due to the fact that this kernel is not irreducible. We refer to |Cenac| 



et al. (20101 for more details on this example. Therefore, this chain has a degenerate type of phase 
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transition. Nevertheless, the phase with probabihty one of having +1 is obviously a finitary coding 
of an i.i.d. process and concentration of measure at exponential rate. This shows that if the strong 
non-nuUness assumption is removed, existence of finitary coding for the maximum phase does not 
imply uniqueness of the compatible chain. 

Relaxing the continuity assumption. For a discontinuous attractive kernel P, the maximum 
and minimum phases are always distinct and not consistent with kernel P (see Hulse ( 1991[ )). 
Hence, strictly speaking, we don't have a phase transition where there is more than one chain 
compatible with P. In fact, this means that considering discontinuous attractive chains does not 
make much sense from the point of view of non-uniqueness. Nevertheless, we can still ask if the 
maximum and minimum phases of a discontinuous attractive kernel can be finitary coding of an 
i.i.d process. The example below shows that, in general, this could happen. 

Let aG{— and let —1 and +1 be the pasts such that +lj = +1 and —1^ = — 1 for all 
j. Let {en}n(EN be a non-increasing sequence of positive numbers such that lim„_>.oo = e with 
< e < 1/2. Wc define P by 

(+1)) = P{-1\ - ali(-l)) 

P{l\aZl{-l)) = 6„ = P(-l| - aZli+D), 

and put P(l|a;;) = 1/2 for all the remaining pasts x. Clearly P is strongly non-null, attractive, 
symmetric i.e. P{l\z) ~ P{—1\ ~ z) and, non-continuous. Also, let 

P+{l\a) ^ lim P(\\a-_\{-^\)) = 1 - e 

P_(-l|a) = lim P{-\\a-_\{-\)) = 1-6. 



By Lemma 2.3 in Hulse (1991), the maximum phase X^+i^ is consistent with P+ and therefore it 
is the i.i.d. process with probability 1 — e for 1. Analogously, the minus phase X(~i) is the i.i.d. 
process with probability 1 — e for —1. 



Theorem [T] is a direct consequence of Theorems |2] [3] and |4] below. 

Theorem 2. Let P he an attractive continuous kernel. If there exists a unique stationary chain 
compatible with P then this chain can be perfectly sampled by a CFTP algorithm using a discrete- 
value i.i.d. process. 

Observe that for this theorem we do not require strong non-nullness of the kernel. As an im- 
mediate application of Theorem [2j our Attractive Sampler given in Section [s] perfectly simulates 
binary autoregressive and BK processes introduced in Section [2] in their uniqueness regime. Notice 
that for any ?7 > 0, we can exhibit, for the binary autoregressive and BK processes, kernels having 
continuity rate var^ = 0{l/k^) for which the unique stationary chain compatible can be perfectly 



simulated. In particular, in the BK example of Friedli (2010), var^ can be taken so that it con- 



verges arbitrarily slowly to 0. As a comparison, in the work of Comets et al. (20021 the condition 
Sfc>o 0^=0^(1 ~ varfc) = +00 must be satisfied to guarantee the existence of a CFTP algorithm. 
This condition does not hold if var^. = 0{l/k^) with sufficiently small 77. In other words, in the 
class of regular attractive chains, our perfect simulation algorithm (Attractive Sampler) is optimal. 



This is particularly clear in the linear case of Example 1 which is also considered in Comets et al. 
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( 2002 ). In this case, when J2n>i + 7 < 1j the criterium of Fernandez & Maillard ( 2005 ) imphes 



uniqueness, and therefore, our Attractive Sampler works whereas their algorithm is not guaranteed 
to work. 



The following theorem is closely related to Theorem 3 in Marton & Shields (1994). 



Theorem 3. Let ~K. he a FP with stopping time 9. Then the concentration of measure holds at 
exponential rate for X. 

The above theorem holds for any process (we assume neither regularity nor attractiveness) that 
is FP with some stopping time 9 and it is of independent interest. It is also interesting to note 
that the proof technique of the above theorem can be applied to obtain a sharper result for the 
particular case of process that has an CFTP algorithm as is the case for process considered for 



example in Comets et al. (2002), Gallo k Garcia (2011), and De Santis fc Piccioni (2010). The 



result below gives an explicit upper bound for processes having a CFTP algorithm with finite 
expected stopping time. 

Proposition 1. Let ^ be a process that can be simulated by a CFTP algorithm with a stopping 
time 9. //E[0] < oo, then for all e > and all functions f : A" R we have 

' 2e2 \ 



P(|/(Xr)-E[/(Xr)]| >e)<2exp 



Now we have the last ingredient for the proof of Theorem [T] 



(5) 



Theorem 4. Let P be a regular kernel and X a process compatible with P that satisfies the ergodic 
theorem at exponential rate. Then X is the unique stationary process compatible with P. 

Note that, for this result, we do not assume that the kernel is attractive and therefore. Theorem 
[4] constitutes an interesting characterization of uniqueness for chains of infinite order. We cannot, 
in general, relax the strong non-nullness condition in this theorem, because by the example given 
soon after Theorem [l] where the kernel has only one null transition probability, there exists a chain 
that satisfies the ergodic theorem at exponential rate but it is not the unique chain compatible 
with the kernel. 

Now, Theorem [T] follows from the chain of implications shown in Figure [T] that holds when the 
kernel is attractive and regular. 



Finitary 

process Theorem 3 



Exponential 
ergodic 



Theorem 4 



Uniqueness 



Theorem 2 



Figure 1 . Diagram showing the chain of implications stated by Theorem ^ 

From the coding point of view, it is natural to ask if Theorem [T] can be strengthen to a finitary 
coding form a finite alphabet i.i.d. process. This is indeed the case for the original BK example. 

Theorem 5. Let P be the original BK example. Then there exists a unique chain compatible with 
P if and only if the compatible chain is a finitary coding of a finite alphabet i.i.d. process. 
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4. Discussion 

In this work, we were motivated by the following general question: "what is the relationship 
between the existence of a CFTP perfect simulation algorithm and the ocurrence of phase transi- 



tion?" A similar question was studied in Steif & van den Berg (1999), where they proved that if a 



random fields is obtained as invariant measures of monotonic, exponentially ergodic probabilistic 
cellular automata then the random field is a finitary coding of a finite alphabet i.i.d. random field. 
They proved the existence of a CFTP algorithm for this class of processes and then used this result 
to show that there exists a finitary coding from a finite alphabet i.i.d. process to the plus phase 
of a ferromagnetic Ising model below the critical temperature. They also proved that for the plus 
phase of an Ising model above the critical temperature, there is no finitary coding from a finite 
alphabet i.i.d. process. 

It is natural to ask how much of this program can be carried out and improved for other 
stochastic processes. In this article, we consider the regular chains of infinite order, which are 
natural generalization of finite order Markov chains. This class of stochastic models has some 
similarities with Gibbs measures, although the theory developed for Gibbs measure cannot be 
applied directly to the chains of infinite order as there exist regular chains of infinite order that 



are not Gibbs (Fernandez et al. 2011 ). Also, it is fare to say that much less is known about chains 
of infinite order, when compared to the Gibbs measure. For instance, it is not known what is 
the behavior of the loss of memory for unique chains of infinite order in general even for specific 
model as the BK example, and therefore conditions like exponential ergodicity cannot be assumed 
a priori. 

The main result of this article is Theorem [1} which gives a satisfactory answer to our general 
question, stating that, for attractive regular kernels, uniqueness of the compatible chain is equiv- 
alent to the existence of a CFTP algorithm using a discrete-value i.i.d. process which is itself 
equivalent to saying that the measure "enjoys good concentration" . For the proof of this theorem 
we have three ingredients. The first is the equivalence between uniqueness of compatible chain 
for attractive regular kernels and the existence of a CFTP perfect simulation algorithm for the 
compatible chain. As a corollary for this result, we obtain a perfect simulation algorithm for a 
large class of models previously not known to be possible. The second ingredient is the fact that 
if a process is a finitary coding of an i.i.d. process, then it has the concentration of measure at 
exponential rate. This result give us an explicit upper bound for the concentration of measure 
property based on the tail probability of the stopping time of the finitary coding. We note that 



the blowing up property (Marton & Shields 1994) could be used instead of our concentration of 



measure result (Theorem [3|, but we think that our result give us more explicit information about 
the process. Finally, the third ingredient is the proof that if a process compatible with a regular 
kernel has the ergodic theorem at exponential rate, then it is the unique compatible chain. 



Petit ( 1982 ) proved that discrete- value i.i.d. processes with infinite entropy are finitary isomor- 
phic. This implies also that given a discrete-value i.i.d. process with finite or infinite entropy, 
there exists a finitary coding from a unique discrete-value i.i.d. process with infinite entropy to 
the given i.i.d. process. Therefore, the finitary coding considered in this article can be made 
source universal, i.e., the discrete- value i.i.d. process used to finitarily code does not depend on 
the particular attractive regular chains. Because the entropy of an attractive regular chain is finite 
but can be as large as possible depending on the choice of the model, there is no hope to obtain a 
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finitary coding using a universal finite alphabet i.i.d. process. Therefore, when considering source 
universal coding, our result is optimal with respect to the cardinality of the alphabet of the coding 
process. 

For non-attractive and non-regular cases, we also obtain some results. Theorem [2] shows that 
for attractive continuous, not necessarily non-null kernels, uniqueness implies existence of a CFTP 
perfect simulation algorithm for the unique chain. Indeed, a routine argument shows that feasibility 
of our CFTP algorithm (Attractive Sampler) implies uniqueness. We note that this equivalence 
does not necessary hold for all CFTP algorithm as we exhibit an example of attractive continuous 
kernel with a single null transition probability that exhibit more than one compatible measure and 
for which the maximum phase is a finitary coding of an i.i.d. process. The important observation 
is that the coding is not realized using the Attractive Sampler for this example. 

Theorem [3] shows that if a process, not necessarily attractive nor regular, can be sampled by a 
CFTP algorithm, then the concentration of measure holds at exponential rate. Therefore, combin- 
ing Theorems [2] and [3j we have that if a process compatible with an attractive continuous kernel 
cannot have the ergodic theorem at exponential rate, then this kernel exhibit phase transition. We 
do not pursue further this observation in this work, but it seems interesting to investigate if one 
can prove the existence of phase transition studying rate of convergence of the ergodic theorem. 

Finally, we show that the chain compatible with the original BK example is a finitary coding of 
a finite alphabet i.i.d. process if and only if it is unique. For the BK example, it is not clear for us 
what is the quantity equivalent to the statistical mechanics notion of temperature, and therefore 



we cannot make a one-to-one comparison with Theorem 1.1 in Steif & van den Berg (1999) 



Nervertheless, we observe that Steif & van den Berg ( 1999 1 was able to show, for the ferromagentic 



Ising model, that the finitary coding from a finite alphabet was equivalent to uniqueness only up 
to the critical temperature, whereas Theorem [5] has no restriction. 

5. The Attractive Sampler 

Assume that we are given an attractive continuous kernel P for which there exists a unique 
compatible stationary chain. The alphabet is A = {l,...,s}. As stated by Theorem [2] there 
exists a CFTP algorithm that samples from its stationary law. Here we construct one such CFTP 
algorithm, we will call the Attractive Sampler. First, let us introduce the kernel P on Ax A defined 
as 

s s 

P{{a' >a,b'> b)\{x,y)) := ^ P(z|a;) A ^ P(*|y), (6) 

i—a i—b 

for any pair of pasts x and y and any pair of symbols a and b in A. This kernel defines a coupling 
between the kernel {P{a\x)}aeA and {P{a\y)}aeA- To see this, first observe that 

s 

^P((a,s)|fe|/)) P((a' > 1,6' > s)\ix,y)) = P{s\y), 
secondly, observe that 

s 

Y,H{a,s-l)\{x,y)) = P{{a' > 1,6' > s ~ l)\{x,y)) ~ P{{a' > 1,6' > s)\{x,y)) 

a=l 

= [P{s-l\y)+P{s\y)\~P{s\y) 
= P{s-l\y). 
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Now, continuing the recursion, we show that X]I=i ^(('*' ?/)) = Pi^v) ^oi any b £ A. The 
same holds for the sum over 6, that is J2b=i Pii'^T^)\isi:y)) = P{0'\y) for any a £ A. This means 
that P defines a coupling between the chains with respective fixed pasts. 

A straightforward but tedious computation shows that P is indeed continuous and we can use 
the result of Kalikow ( 1990 ) stating that continuous kernels can be written as a countable mixture 
of Markovian kernel of increasing order. Formally, for P, there exists a sequence of non-negative 
numbers {Xk}k>o with X]fc>o -^fe = 1 ^ sequence of Markov kernels {P^''^k>o, where P^'^' is a 
fc-steps Markov kernel, such that 

P((a,6)|(x,y)) = ^A,pW((a,6)|(x:i,y:i)). 

fc>0 

We will now use this representation of P to define our algorithm. For this, for any integer /c > 0, 
let 

= > a,b' > b)\{xZlyZl)) e [0,1] : (a, 6) € A^xZlyZl) G A^^}. 

Denote the elements of TZ^'^^ by r^i < rk2 < ■ ■ ■ < r^-jiik]^ = 1. Then, we define qki — Vki and, for 
1 < i < I'R-^''^, we put Qkj ^ rkj - r^j^iy 

Now, let L = {Lj}j^z, be an i.i.d. process with values on N such that for all fc e N and 

JG{l,...,|7^W|} 

\ m=-l / 

where we define TZ^^^^ = 0. 

We will use the i.i.d. process L to define our algorithm. For this, we define the update function 
F : X N ^ ^2 ^jy 

F{{x,y),Lo)^{a,b), (7) 
if Lo = E!;=-i I^'"' I + j and if, for all (c, d) e A^, 

P['l((«' >c,b'> d)\ixzlyZl)) i \r>.,.P'^''\{d > a,b' > b)\{xZlyZl))'_- 

Let the concatenation of pairs of symbols be understood coordinatewise, i.e., (a, b){c, d) = (ac, bd) 
whenever {ac,bd) is well defined. Using this notation, we define, for any — oo < k < I < +oo, the 
successive iterations of F as 

F[k.i]{ix,y),Li)^F{Fik,i-i]ax,y),^ (8) 

where P[fc,fe] (a;, C/fe) := F{x,Uk)- Notice that for any {x,y) and {a,b), 

nF{ix,y),Lo) = (a,6)) =P((a,6)|(x,y)), 

and therefore we obtain a coupling {{Xf, xf-)}i>o because 

F[o,.](fey),i?,) = (Xf,xf). 

Furthermore, we define for any i e Z, the random variable 

e[i] min{j > : F[.-,;.]((a, s), L^^,) G {(1, 1), . . . , (s, s)} for all a E ^-^}, (9) 

and an important observation is that in the particular case of attractive chains, 

e[0] = mm{j > : F[-,,o] ((1, s), i'^,) e {(1, 1), • • • , (s, s)}}. (10) 
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Finally, define the reconstruction function of time i e Z by 

[$(u)],-^^[_,[,],,]((i,s),L':e[,]). 



(11) 



It can be shown in a standard way (see for example Propp & Wilson ( 1996 ) for the Markovian case, 
or Comets et al. (2002) for chains of infinite order) that if 9[Q] is P-a.s. finite (that is, the Attractive 



Sampler is feasible), the sample [<i>(U)]o is constructed according to the unique stationary measure 
compatible with P. In fact, the compatibility and the stationarity follow from the construction. 
The uniqueness follows from the loss of memory the chain inherits because of the existence of 
almost surely finite "regeneration times" 6[i] for any i e Z. 

Thus, this is a finitary coding which is particular in that {^[0] = k} is J^(C/^j,)-measurable. In 
Section 6.1 (proof of Theorem [2]), we will prove that for attractive continuous P, the Attractive 
Sampler is feasible if and only if P has a unique compatible chain. 



Algorithm 1 Attractive Sampler 
1: Input: F; Output: 0[O], [$(L)]o 
2: Sample Lq with distribution P 
3: i^O, 0[O] ^ 0, [$(L)]o ^* 
4: while ^^[_,-o]((l,s),i'L,) i {(1,1),..., (s,s)} do 

5: i -S- i + 1 

6: Sample i_i with distribution P 
7: end while 

8: 0[O] ^ i 

9; [$(L)]o^^^[_.,o]((l,s),LOj 

10: return e[0], [$(L)]o. 



6. Proof of the results 

From Section |3] it is clear that Theorem [T] follows directly from Theorems [2j [3] and |4j 

6.1. Proof of Theorem [2} First we need the following lemma proved for the regular and attrac- 
tive kernel by (Hulse 1991[ ). Here, we drop the unnecessary non-nullness condition. 

Lemma 1. Let P be attractive and continuous and consider the function F defined in Section^ 
If there exist a unique chain compatible with P , then for all i Cz A = {2, . . . , s}, 

^hm P(^^[o,„]((l,s),L^) > (*,1)) -P(F[o,„]((l,s),Ljf) > (l,i)) =0. (12) 

Observation 1. Remember that F[Q n-^{(l, s), Lq) ^ (X^, X^i) and therefore, we have that for any 
i e A, P(F[o,„]((l,s),io) > (i,l)) gives the law of xk- 

Proof. For a, b,c,d(£ A, we denote (a, b) > (c, d) ii a > c and b > d. 
Because P is attractive, for all Zi, . . . , /fe, fc € N and ai, . . . ,ak £ A 

P(%„+;,]((l,s),ir'^) > (l,ai),...,^^[o,„+;.]((l,s),Lr'^) > (1,0^)) 

and 

nFlo,n+i,]{{Ls),L^+'') > (ai,l),...,i^[o,„+;.]((l,s),Lr''=) > (a,,l)) (13) 
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are respectively non-increasing and non-decreasing in n g N. Therefore, both sequences are con- 
vergent in n G N and their hmits when n diverges define stationary chains. By contructions, 
for all b £ A and a G ^4^^^ the respective chains are compatible with the kernels P'^^^ given by 
P^-^ib\a) = lim^^^ P{b\aZl,s) and P-'"(fe|a) = lim„^^ Pib\aZl,l). 

If the kernel P is continuous, P™^^ = pmm _ p g^j^j therefore the both chains are compatible 
with P. This implies that if there exists only one chain compatible with P, for i ~ 2, . . . , s, we 
have 

J[im^P(F[o,„]((l,s),i;j) > (z,l)) -P(F[o,„]((l,s),LJ}) > (1,*)) =0, 

as we wanted. 

□ 



We now prove that, for attractive regular chains, convergence (12 1 implies that the unique 
stationary chain compatible with P can be sampled by a CFTP algorithm. We have to construct 
two functions $ and 9 that satisfy the conditions given in Section [2] for the definition of CFTP 
and FP. 

We consider the respective functions used for the the Attractive Sampler defined in Section [5] 
and it only remains to prove that 9[0] is P-a.s. finite. We have 

Pm > n) = P (n°^_„ {Fb-,o]((L s), L°) ^ {(1, 1), . . . , (s, s)}}) , 

which yields, using first the attractiveness and then the translation invariance of L 

¥{0[O] >n)=P (F[_„,o] ((L s), i° J ^ {(1, 1), . . . , (s, s)}) 
= P(F[o,„]((l,s),io")^{(l,l),...,(s,s)}). 

Thus, we want to prove that 

hm P(F[o,„]((l,s),LJf)^{(l,l),...,(s,s)}) =0. 
Due to the attractiveness, our coupling guarantees that for a ^ A 

P(F[o,„]((l,s),Lo") > (a,a)) =P(F[o,„]((l,s),L^) > (a, 1)) . 



Taking the limit and using ( 12 1, we have for any a £ A 



hm ¥{F[o.n]{{hs),LlS)>{a,a))^ lim P (F[o,„] ((1, s), L^) > (a, 1)) 

= lim P(F[o,„]((l,s),iJ,')>(l,a)). 

n— ^-t-oo V L 1 J / 

Now, for any a € A, let r(a) = {{hj) G A^ : i < a and j > a}. From the above equation, we 
have that 

lim P(P[o.„]((l,s),L^')er(a)) =0, 



and this implies 



lim P(F[o,„]((l,s),i^')^{(l,l),...,(s,s)}) =0, 



which concludes the proof. 
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6.2. Proof of Theorem (sj Let X be any process with alphabet A. For all x € let X- be 

the process with fixed past x. Let y,z€ and (X^, X-) be a coupling between the process "XM- 
and X-. The following lemma, which we state without proof, is a direct consequence of Theorem 
1 oflChazottes et aZ.I (12007). 



Lemma 2 (Chazottes et al. (2007)). Let (X^^-, X-) he couplings for each pairs y,z (z . If 
^uPy 2 Sj^i ^(-^J 7^ -^f) ^ ^ < then for all integer n > 1, functions f : A^ — > R, and e > 
we have 

P(|/(Xn-E[/(Xn]|>e)<2exp(- 



(1 + A)2||^/||2 



Assume that the process X is a finitary coding of a sequence U, where Ui G U for any i G Z. 
Let 6 and $ be the quantities involved in the finitary coding as defined generically in Section [2] 
(and not necessarily as in Section [5]). Let e > and ||(5/||£i(n) < 7- Take r G N such that 

P(6'(U) > r) < e/(87). (14) 

For j e Z, let V^-'l be a family of i.i.d. processes with values in U independent of each other 
and of U. We define a process Y as 



Y, = [$(^^rc^ 



')]-, , for any j e 



where, for any i and j in Z, we use the notation v}''^'°° for the sequence v}'''^ Vi+i 



J — r — oo 

Clearly, Y is 

stationary and if ^(U) < r, Yq — Xq. Moreover, Y is a 2r-dependent process i.e., for all l,m > 1 

and y e A^ 

'^{'^l - 2/li J^Z+2r+l - yz+2r+l > — — VlJ'^y^ l+2r+l " yi+2r+l >■ 

Now, observe that 

/(xn - E[/(xr)] = f{xf) - f{Yn - E[/(xn - fivn] + fivn - EUiYn]. 

From the definition of /, we have that 



(15) 



f{X-) - f{Yn < ^ 1{X, ^ Y,}5,f < J2 mT^V) > r}S,f 



and 



|-E[/(Xf) - f{Y,% < P(0(U) > r)J2S,f < el 

Let ttj = ^j//||<5/||^j(N). Collecting (15), (16), (17), we have 
P(|/(Xn-E[/(Xn]| >e) 

n 

Y,a,l{e{Tl,V)>r}-nom>r) 



(16) 



(17) 



< 



> 



4||^/lk(N) 



'(|/(yn-E[/(n" 



>e/2). 
(18) 



We will use Lemma [2] to obtain upper bounds for the two terms of the right hand side of ( [l8| . 

Let us begin with the second term, and we will use the fact that Y is 2r-dependent process. 
First we define a coupling (Y, Y), where Y and Y are copies of Y, by 



j ) , for all j G Z 
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where Y^^\Y'^^^ and U,U' are i.i.d. processes satisfying the following properties. For j > 0, 
V^] = V'^-*'. For j < 0, Vl^l is independent of V'"'', and both are independent of the rest. 
The processes U, U' are independent of V[-'1,V''"'' for all j € Z. Also {Uj}j<r and {U!j}j<r 
are independent and for j > r, Uj = Uj. From the construction, we have that, for j > 2r, 
V{Yj = Yj) = 1. Moreover, for all y,z€ A"^ and j > 2r, 

P(y, = Y,\Y^^ = ij, = z)= ¥{Y, =%). 

Therefore, using the above coupling and Lemma [2] we have 

51 (e, 7) \ 



P(|/(Yi")-E[/(ri")]|>e/2)<2cxp 



¥f\\l 



where 51(6,7) = \e^{l + r + Y.]^i P(6'(U) > j))"^. Note that r depends on e and 7. 

For the first term of the right hand side of (18 1, let Zj = 1{6'(T^U) > r}. Observe that the 
process Z = {Zj}j^z is stationary. Also because the event {^(U) < r} is F{ULj.) measurable, the 
process Z is a 2r-dependcnt process. We define a coupling (Z, Z), where Z and Z are copies of Z, 
by 

(Z„ Z,) = (1{0(T^U) > r}, 1{0(T^U') > r}) 

where U and U' are i.i.d. process with Uj = [/j for j > r and {Uj}j<r and {Uj}j<r are independent. 
Therefore, again by Lemma [2] we have that 



^a,l{0(T^U)>r}-P(0(U)> 



j=i 



> 



4||^/IUi(N) 



< 2exp 



.92(6, 7) 



£2(N), 



where 52(6, 7) = + + Ej=i P(fi'(U) > j))-^. Thus we have 

p(i/(xr) - E[/(Ar)]i > 6) < 4exp f- „y/''^^ 

which proves the theorem. 



(19) 



(20) 



6.3. Proof of Proposition [1} We use the same notation as in the proof of Theorem[3] We define 
a coupling (X, X) between two copies of X by 

(1„X,):=([$(U)]„[$(U%-), for all jeZ 

where U and U' are i.i.d. process such that, for j > 0, Uj — Uj and {Uj}j<o and {U'j}j<o are 
independent. From the definition of CFTP algorithm, we can take A < J2T=i IP(^(U) > j) = E[e] 
Applying Lemma |2j we obtain 

6.4. Proof of Theorem|4} We say that a stationary process X has the positive divergence property 
if 



lim inf 



1 



n n + 1 



J2 P(y°„ = a-Jlog 



p(r" 



ae{-ia}"+i 
for any ergodic process Y different of X. 

The proof of Theorem |4] is based 011 the following lemmas 



P(X1„ = a\) 



> 



Lemma 3 (Marton & Shields (1994)). Let be a stationary process and let the ergodic theorem 
holds at exponential rate. Then X has the positive divergence property. 
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Proof. If X has the concentration of measure at exponential rate it satisfies Q. Therefore, Theo- 
rem 1 of Marton & Shields ( 1994 ) implies that X has the positive divergence property. □ 

Lemma 4. Let X and Y be two stationary processes compatible with a continuous kernel P. Let 
infQg{_x.i}~ Pi(^o\o.-lx}) — S > 0. Then the relative entropy rate 



^ ^ P(X'^„ = a"_„)log: 



exists and is 0. 

Proof. Let Z represent X or Y. Define, for G N, 

ae{-i4}'=+i 

Now, we can rewrite the relative entropy rate as 

lim ^{iIx(r"„)-i/x(X°„)}. 

n->oo n + 1 

Define also, for fc e N*, 

Qe{-i4}'=+i 

By the chain rule and stationarity of the processes, we have 

n 

Hx{Zl,) = iJx(^o) + E H^iZolZZl). 

k=l 

Therefore, we have that the relative entropy rate is a difference between two Cesaro sums. To 
prove that the relative entropy exists, it is enough to show that the limit 

lim H^{Zo\Zll,) 

n—yoo 

exists. To see that i/x(^o|^~n) converges, let fiz be the measure associated with Z, we have 

Hx{Zo\Zzl,) - -Ex(log(Mz(^o|^r^))). 
By assumption, for all a £ {—1,1}^ 

-log(/iz(ao|al,\)) < -log^, 
therefore, by dominated convergence theorem 

lim -Ex(log(/iz(^o|^r;^))) = -Ex( lim log(Mz(^o|^"')))- 
By continuity of P we have that, for all a e { — 1,1}^ 

lim log(/xz(ao|al,\)) = logP(yo = ao\YI^ = al^^). 

n— >-oo 

Hence, 

hm H^iZolZl'r.) = -Ex(log(P(Xo|Xr^))), 

n— f C30 

which concludes the proof. □ 

Proof of Theorem^ If X has the concentration of measure holding at exponential rate, by Lemma 
[3j X has the positive divergence property. By Lemma [4j if X has the positive divergence property, 
there is no other ergodic process compatible with P. This concludes the proof. □ 
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6.5. Proof of Theorem[5} The kernel of the original BK example (see Section[2]) writes as follows 
P['"'=l i+l\aZlj f (1 - 6)1 I f] a_, > o| + 61 I -L f; a_, < o| V (21) 




with Afc — cr^ for some positive constant and r € (2/3, 1). In other words, the BK example is 
already given under the form of a countable mixture of Markov kernels fc > 1 of lacunary 

ranges. Here, the parameter of the kernel is the sequence {mk}k>i^ and taking it increasing 



sufficiently fast, Bramson & Kalikow (1993) proved that their is phase transition. On the other 
hand, taking this sequence increasing slowly, the resulting kernel will satisfy uniqueness and we 
denote by X^^^ the unique stationary chain compatible with in this case. 

We will prove Theorem [s] in two steps. First, Lemma [s] states that is a finitary coding of 

an i.i.d. process with discrete state space and finite entropy. Then, Lemma |6] (stated and proved 
in Appendix [A|) states that i.i.d. chains with finite entropy are finitary codings of i.i.d. chains on 



a sufficiently large, yet finite, alphabet. This later result is known (Rudolph 1982), but our proof 
is new and simple, and we think that it is worth mentioning here. Since it has a life of its own, we 
decided to put the statement and proof of this lemma in an appendix. 

We will conclude the proof of the theorem observing that if a process X is a finitary coding of 
a process Y, which is itself a finitary coding of a process Z, then X is also a finitary coding of Z. 

Let L-^^ be an i.i.d. process taking value in the alphabet A := {0_,0+} U {1,2,...}, with 
distribution Q{L^^ = 0_) = Q{L^^ = 0+) = 6 and Q{L^^ = k) = Afe(l - 2e) for fc > 1. Observe 
that has finite entropy, since 

(L^^) := J2 log = 26 log 6 + (1 - 26) log(l - 26) + (1 - 26) ^ A,- log A,- 

a&A k>l 

is finite for the sequence {Afe}fe>i we consider here (in the original BK example). 

Lemma 5. X^^ is a finitary coding ofh^^. 

Proof. We first rewrite the kernel P^^ in the following way 

P^^(+l|a) = 6 + ^(l-26)Aa|^X]a^, >o|. (22) 
fe>i I ^ 1=1 J 

This decomposition means that the construction of the next symbol of the chain using P^^ can 
be done using the following two steps procedure: (1) generate Lq^ with distribution Q and (2) if 
Lq^ = 0_ or 0+, put —1 or +1, otherwise, if Lq^ = k > 1, put the symbol that most occurs in 
XZmk ■ This procedure motivates the use of the following update function for the coupling between 
the chain with fixed past —1 and the chain with fixed past +1: 

(-1,-1) ifL^^-0_ 
(+1,+1) ifL^^^ = 0+ (23) 

(maj(ali„J,maj(6l,^J) if if ^ = fc > 1 

where maj(aZ^^) denotes the symbol of {—1, +1} that most occurs in aZmt- This update function 
defines a coupling kernel which satisfies ^ for any pairs of ordered pasts a<b, and therefore, can 
be used to perform the Attractive Sampler. It follows, using the same proof as the one of Theorem 
[2] that X^^ is a finitary coding from via the Attractive Sampler. □ 



F--((a,fe),L--):= 
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Appendix A. Finitary coding from a finite alphabet i.i.d. process to a countable 

ALPHABET I.I.D. PROCESS 

Lemma 6. An N-valued i.i.d. process having finite entropy can be constructed as a finitary coding 
of an i. i. d. process with finite alphabet. 

We will explicitly construct the coding from a sequence V of piles of (a sufficiently large integer 
we will fix) N (unbiased random binary) bits, that is, we construct a finitary coding from an i.i.d. 
process taking values (uniformly) in {0, 1}^ to the N-valued i.i.d. process we will denote Y. This 



algorithm is in the spirit of the one of Harvey et al. (2007). The proof that this coding is actually 



finitary uses the techniques developed in Ferrari et al. (2000). In fact, we will show more: the 
coding length has finite expectation. 

Before we present the algorithm, we need some more definitions. 

A simulation of {q{k)}k>o = {i^fc}fe>o (? distribution on N) from independent unbiased bits (law 
r) IS a pair of measurable functions T : {0, 1}"^ ^ N and 5 : {0, 1}"^ N defined P-a.s., with the 
following properties: 



(1) If a+°°,aj^°° e {0,1}^* are such that ai 



ai 



then T{a+°°) = 



T{a+^) and 5(a+°°) = 5(a+°°). 
(2) Under the measure P, S{af°°) has distribution q. 

T is called the stopping time of the simulation, and S is called the output symbol. We do not give 
the explicit definition of such a pair of functions in order to save space. We refer the interested 



reader to the papers of Knuth & Yao ( 1976 ) , Romik ( 1999 ) and Harvey et al.\ ( 2007 ) for further 



details. For any given distribution q, there is a fixed m* > 1 such that T{a'{°°) > m* for any 
On the other hand, we have the following result which follows, for example, from Item (ii) 



of Theorem 1 in Romik ( 1999 ) 



Lemma 7. Assume that '^^y^Vk^ogi^k < +oo, then, there exists pair {T,S) such that 



ET := ET{a+°°) < +oo. 



For the piles at each sites, we use the notation Vk '■— ai 



(fe) 



, where a[''^ G {0, 1} for any 



i € {1, . . . , N} and fc € Z. For any k G Z, let us denote by (T^, Sk) the simulation of site k. The 
sequence of bits read by (Tfc, 5^) is a subset of the piles of bits Pi, i < k. This is what we explain 
now, first in an informal way (borrowed from Harvey et al. ( 2007[ )). 



Observation 2. When we say that we use the bits of a pile, we mean we use them from the bottom 
to the top. If we use bits of a pile which has already been used, we begin from the lower unused 
one, and climb up the pile. 



Algorithm inspired on Harvey et al. (2007). For each site fc € Z, {Tk,Sk) attempts to use 



the random bits in Vk to simulate Yfc. In many sites fc e Z, the stopping time Tk will be reached, 
that is Tk[a''i^ . . . aj^^ oln+i) ^ ^ for any a^^-^. If the stopping time is reached, Vk may contain 
unused bits, which are still independent and unbiased, that is, if ^^(a^' 



(fe) 



then a 



(k) 

i+l 



(fc) 



are still unused, independent and unbiased. For any site k whose stopping time 



Tfe is not reached, look at Vk-i in the next site to the left (that is, the simulation goes backwards 
in time) to find unused bits to continue the simulation. If now the stopping time Tk is reached, 
compute Yfe. If not, iterate, looking one site further to the left at each step for unused bits, until the 
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->0 




(0) 


— Zl 


i-0 




+1 


:=a« . 






-*0) + l 




->2 




(2) 


..0-^1 
^Ti(zO- 


-i) + l 



stopping time is reached. This iteration is done simultaneously for all sites, in order to maintain 
translation-equivariance of the construction. 

More formally, the string of bits read by {Tk, Sk), k > 0, when we limit ourselves to looking at 
the piles Vi, i>Q are defined as follow: 

. . . ^|zO-*0| 
. . . Z|z0^1| 

-O^fe „(fe)^0-i-fe-l ^0-)-fc-l 

^ "l ■ ■ ■ -^Tfc_i(z0^fc-i) + l ■ ■ ■ ^Iz"^*""!! 

where, for any fc > 0, the length of the string z^^^ is denoted by jz'^^*''!, and we used the convention 
that Tfc(z°^'^) = oo when the simulation is not successful using only the string z^^^ , otherwise, 
Tfc(zO^'=) = i < \z°^''\, the smallest integer such that Tk{z°^''b+^) = i for any 6+°°. 
We now define 

:= max{j < : T,(z^^^) < \z^^'\ ,t^j,...,0}, (24) 

and 

[S{V)],:^S{zno]^% 

When T(7^)[0] is a.s. finite, the construction is feasible, and the algorithm constructs a sample of 
the target distribution, that is the product measure q^. This is because the strings of unbiased 
bits z'^H^^ i € Z are all finite and disjoint, and the simulators (T^, Si), i e Z are designed to read 
strings of unbiased bits and transform them into bits distributed according to q. Thus, in order to 
prove Lemma|6j it is enough to prove that T(7')[0] is a.s. finite. 

Proof of Lemma Let us first show that 

P(f ) := P (To(z°-") < |z°-°| , ri(zO-i) < |z°-i| , . . . , T,-(z°--'") < |z°-^ | , . . .) > 0. 

Observe that on the event £, all the strings z'^^' are finite. More specifically, we have |z'^~^°| < N 
and |zO^J | < (j + l)N - J^IZq T^iz°^'), for any j > 1. Thus, 

Pi£) =. P (Toiz°^") < N , ro(zO^°) + Ti(z°^i) < 2iV , . . . , ^ T,(z°^^) < (j + 1)7V , . . . j 

But {z^^^}j>o constitutes a sequence of disjoint strings of unbiased independent random bits. 
For this reason, for any sequence of independent infinite strings of binary bits {a(*^}i>o, a*-'^ := 
ag'^'a^*^ . . ., we have 

Pi£) =P [To(z"^°a°) <N, Toiz°^"a°) + Ti{z°-'^a^) < 2iV , . . . , ^T,{z°^'a') <{j + l)N,... 

\ i=0 

Now, {Tj{z'^^^ a-')}j>o is a sequence of i.i.d. random variables taking values in {m* , m* + 1, . . .} (we 
recall that m* is the smallest number of bits necessary to simulate from q) , and having expectation 
ETo(z'^^"a°)) < N. Denoting Q := Tj{z^^^ a^) — N, we obtain an i.i.d. sequence C (where 
Ci e {-N + m*,.. .}) such that EC < 0. Therefore 

P(£)=P(Ci <0, C1 + C2 <0, C1+C2 + C3 <0,...)>0 
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is true because X]i=i d is a negatively drifted random walk, which is transient. 

Now, define the chain | on {0, 1} by := 1 {j = T[j, +oo]}. Then, consider the sequence of 
time indexes T defined by S,j = 1 if and only if j — Ti for some I in Z, T; < T/+i and with the 
convention Tq < < Ti. We will now prove that the chain ^ is renewal (that is, the increments 
{Tj+i — Ti}i(zz are independent, and are identically distributed for i 7^ with finite expectation). 
Define for any —00 < m < ji < +00 the events 

H[m,n] := {Ti(z"^') < , i & {m, . . . ,n}} . 



which is measurable with respect to the cr-algebra generated by V^. By definition (24) of T[m, n], 
—00 < m < n < +00, we have for any ti < t2 < ■ ■ ■ < tn 

n n 

fl {r[t,, +00] ^ti}^f] H[ti,ti+i - 1] (25) 

1=1 1=1 
where tn+i +00. This is an intersection of independent events. Then, we observe that by 
stationarity, 

V{H[j, +00]) = P(iJ[0, +00]) = P(£) > 

and 

¥{H[^j,-l]) = ¥{H[-j,+oo]\H[0,+oo]) , Vj > 1. 



Together with ( 25 ) , this yields for any sequence of integers ti < t2 < ■ ■ ■ < tn 

n-l 

P(6, = 1, / = 1, . . . , n) = P{£) II nUu+^)~t, = lICo = 1) 

1=1 

and therefore, the chain ^ is renewal (has positive renewal probability). It follows that {Ti^i — 
Ti}i^z are independent, and are identically distributed for i ^ with finite expectation V(£)^^. 
We conclude that T[0] has finite expectation since 

F{Ti+i -Ti>m)= P(r[l, +00] < -TO|r[0, +00] = 0) = P(r[0] < -m + 1). 

□ 

Acknowledgments. We gratefully acknowledge A. Galves, R. Fernandez, K. Marton and S. Friedli 
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